Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing

نویسندگان

Matthaios Olma

Manos Karpathiotakis

Ioannis Alagiannis

Manos Athanassoulis

Anastasia Ailamaki

چکیده

The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. Hence, recent in-situ query processing systems operate directly over raw data, alleviating the loading cost. At the same time, analytical workloads have increasing number of queries. Typically, each query focuses on a constantly shifting – yet small – range. Minimizing the workload latency, now, requires the benefits of indexing in in-situ query processing. In this paper, we present Slalom, an in-situ query engine that accommodates workload shifts by monitoring user access patterns. Slalom makes on-the-fly partitioning and indexing decisions, based on information collected by lightweight monitoring. Slalom has two key components: (i) an online partitioning and indexing scheme, and (ii) a partitioning and indexing tuner tailored for in-situ query engines. When compared to the state of the art, Slalom offers performance benefits by taking into account user query patterns to (a) logically partition raw data files and (b) build for each partition lightweight partition-specific indexes. Due to its lightweight and adaptive nature, Slalom achieves efficient accesses to raw data with minimal memory consumption. Our experimentation with both micro-benchmarks and real-life workloads shows that Slalom outperforms state-of-the-art in-situ engines (3− 10×), and achieves comparable query response times with fully indexed DBMS, offering much lower (∼ 3×) cumulative query execution times for query workloads with increasing size and unpredictable access patterns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive partitioning and indexing for raw data querying

Traditional database management systems approach to data analytics assumes that the input would be loaded within the DBMS, and then queried upon. However, data analytics depend on the interaction with the data analyst and as data collections grow larger and larger, data loading acts as a bottleneck and it incurs significant data-to-query delay. In this paper, we examine the NoDB paradigm, which...

متن کامل

Design and Evaluation of a Method for Partitioning and Offloading Web-based Applications in Mobile Systems with Bandwidth Constraints

Computation offloading is known to be among the effective solutions of running heavy applications on smart mobile devices. However, irregular changes of a mobile data rate have direct impacts on code partitioning when offloading is in progress. It is believed that once a rate-adaptive partitioning performed, the replication of such substantial processes due to bandwidth fluctuation can be avoid...

متن کامل

Multi-Output Adaptive Neuro-Fuzzy Inference System for Prediction of Dissolved Metal Levels in Acid Rock Drainage: a Case Study

Pyrite oxidation, Acid Rock Drainage (ARD) generation, and associated release and transport of toxic metals are a major environmental concern for the mining industry. Estimation of the metal loading in ARD is a major task in developing an appropriate remediation strategy. In this study, an expert system, the Multi-Output Adaptive Neuro-Fuzzy Inference System (MANFIS), was used for estimation of...

متن کامل

A Comprehensive Study of iDistance Partitioning Strategies for kNN Queries and High-Dimensional Data Indexing

Efficient database indexing and information retrieval tasks such as k -nearest neighbor (kNN) search still remain difficult challenges in large-scale and high-dimensional data. In this work, we perform the first comprehensive analysis of different partitioning strategies for the state-of-the-art high-dimensional indexing technique iDistance. This work greatly extends the discussion of why certa...

متن کامل

Hydrograph Estimation based on Various Components of Rainfall Using Adaptive Neuro-Fuzzy Inference System in Kasilian Watershed

Flood hydrograph preparation and estimation are considered a comprehensive information for soil and water managers and planners. While it is not simply possible preparing it for all watersheds. Therfore suitable flood hydrograph estimation and modeling seems to be necessary using available rainfall data. The study area is located in Kasilian representative watershed in Mazandaran province compr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 10 شماره

صفحات -

تاریخ انتشار 2017

Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing

نویسندگان

چکیده

منابع مشابه

Adaptive partitioning and indexing for raw data querying

Design and Evaluation of a Method for Partitioning and Offloading Web-based Applications in Mobile Systems with Bandwidth Constraints

Multi-Output Adaptive Neuro-Fuzzy Inference System for Prediction of Dissolved Metal Levels in Acid Rock Drainage: a Case Study

A Comprehensive Study of iDistance Partitioning Strategies for kNN Queries and High-Dimensional Data Indexing

Hydrograph Estimation based on Various Components of Rainfall Using Adaptive Neuro-Fuzzy Inference System in Kasilian Watershed

عنوان ژورنال:

اشتراک گذاری